Skip to content

feat(retry): 429 rate-limit retry + multi-provider integration validation#10

Merged
aksOps merged 4 commits into
mainfrom
feat/rate-limit-retry-and-multi-provider-validation
May 14, 2026
Merged

feat(retry): 429 rate-limit retry + multi-provider integration validation#10
aksOps merged 4 commits into
mainfrom
feat/rate-limit-retry-and-multi-provider-validation

Conversation

@aksOps

@aksOps aksOps commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

v1.5-C follow-up. Three improvements caught while live-validating the per-agent provider story (intake on Ollama, downstream on OpenRouter):

  1. feat(retry) — Free / shared upstream tiers (e.g. OpenRouter …:free models) throttle on 30-60s windows. The existing 5xx backoff (1.5s/3s/4.5s) exhausts retries before the window clears, surfacing the 429 as EnvelopeMissingError or agent failed. Added a separate _RATE_LIMIT_MARKERS set + longer rate_limit_base_delay (7.5s/15s/22.5s, total ~45s).
  2. test(integration)tests/test_integration_driver_s1.py was written in the Phase 15 (response_format JSON) era; its responder skill prompt missed the Phase 22 markdown contract → live Ollama call hard-failed with EnvelopeMissingError. Added the contract block to the prompt. Also added an azure parametrize arm so the live verification covers all three production provider kinds. Per-leg skip semantics — partial-key environments now exercise whichever providers they can reach.
  3. chore(config) — Switch llm.default to workhorse and point workhorse at inclusionai/ring-2.6-1t:free. Demonstrates the v1.5-C per-agent flow with two real providers in the same INC. Operators on a paid OpenRouter plan should swap back to a paid model.

Changes

Commit What
c638352 _ainvoke_with_retry two-regime backoff + 5 new tests
c8da236 S1 driver: markdown contract in prompt + Azure leg + per-leg skip
7d29cf0 config/config.yaml default → free OpenRouter model
1df2072 dist regeneration for the retry change

Test plan

  • uv run ruff check src/ tests/ — clean
  • uv run pytest -x1265 passed, 8 skipped (was 1260, added 5)
  • tests/test_integration_driver_s1.py::…[local]PASSES end-to-end against Ollama Cloud gpt-oss:20b
  • dist bundles regenerated

Live verification matrix (with this dev environment's .env)

Leg Result Reason if not green
local (Ollama Cloud) ✅ pass n/a
workhorse (OpenRouter free model) ⚠️ rate-limited the new 429 retry is what this PR adds to make multi-call sessions reliable
azure ⚠️ Connection error .env has placeholder AZURE_ENDPOINT='noop…'; framework path itself constructs AzureChatOpenAI cleanly

🤖 Generated with Claude Code

aksOps added 4 commits May 14, 2026 16:19
Free / shared upstream tiers (e.g. OpenRouter ``…:free`` models)
throttle on short windows that need 30-60s to clear. The existing
5xx backoff (1.5s/3s/4.5s, total ~9s) exhausts retries before the
window opens again, surfacing the 429 as an EnvelopeMissingError
or a hard ``agent failed`` row.

Split ``_ainvoke_with_retry`` into two backoff regimes:
  * 5xx + connection-reset markers: existing ``base_delay`` (1.5s)
    → 1.5s / 3.0s / 4.5s
  * 429 / rate-limit markers: new ``rate_limit_base_delay`` (7.5s)
    → 7.5s / 15.0s / 22.5s (total ~45s before raising)

``_RATE_LIMIT_MARKERS`` covers the variants real providers emit:
``status code: 429``, ``error code: 429``, the bare ``" 429"`` /
``"429 "`` (with space-guard against false positives like 1429),
``ratelimiterror`` (langchain's exception class name), ``rate
limit`` / ``rate-limited``, and ``too many requests``.

Non-429 4xx (401 unauthorized, 422 schema validation, etc.) keep
their fast-fail behaviour — retrying a quota / auth / schema error
just wastes time and masks the real problem.

5 new tests in ``tests/test_ainvoke_retry_429.py``:
  * ``test_retries_on_5xx_and_returns_eventually`` — pins the
    short-backoff path stays at 1.5s.
  * ``test_retries_on_429_with_longer_backoff`` — pins the 7.5s/15s
    progression.
  * ``test_429_phrasings_all_match`` — exercises every marker.
  * ``test_non_transient_error_propagates_without_retry`` — fast-fail
    on 401.
  * ``test_429_exhausts_max_attempts_then_raises`` — bounded retry,
    no infinite loop.

Suite: 1265 passed (was 1260 — added 5), ruff clean.
Two issues caught while live-validating v1.5-C against real
providers:

1. **Stale skill prompt.** The S1 driver's ``responder`` skill was
   written in the Phase 15 (response_format JSON) era; its
   system_prompt told the LLM "respond in one sentence" with no
   markdown contract instructions. Phase 22 (markdown-primary turn
   output) made that fail with ``EnvelopeMissingError`` because the
   parser has nothing to lift. Add the
   ``## Response`` / ``## Confidence`` / ``## Signal`` contract
   block to the prompt — same pattern as the production skill
   prompts under ``examples/incident_management/skills/*/system.md``.

2. **No Azure parametrize arm.** The driver covered ``workhorse``
   (OpenRouter) + ``local`` (Ollama). Azure has been first-class in
   ``runtime.llm.get_llm`` since Phase 13 but had no live verification
   path. Add an ``azure`` arm parametrize that constructs an
   ``AzureChatOpenAI`` from ``AZURE_OPENAI_KEY`` + ``AZURE_ENDPOINT``
   + ``AZURE_DEPLOYMENT`` (defaults to ``gpt-4o``).

Per-leg skip semantics: each arm independently skips when its keys
are absent. Replaces the global ``pytestmark.skipif`` that required
ALL three keys for any leg to run — partial-key environments now
exercise whichever providers they can reach. Drops the
``_OPENROUTER_KEY and _OLLAMA_KEY and _OLLAMA_BASE_URL`` global
gate; the per-leg gate inside the test body owns it.

The ``LLMConfig`` builder also handles a fully-keyless environment
by falling through to a stub provider so config validation passes
during test collection.

Live verification status (with the keys in this dev environment):
  * ``local`` — PASSES against Ollama Cloud gpt-oss:20b
  * ``workhorse`` — fails on credit / rate-limit (account-specific)
  * ``azure`` — fails on connection error (placeholder endpoint in
    .env; framework path itself is intact)
…6-1t:free

Demonstrates the v1.5-C per-agent provider story end-to-end with
two REAL providers in flight:
  * intake (skill override)        → Ollama Cloud gpt-oss:20b
  * triage / DI / resolution (default) → OpenRouter inclusionai/ring-2.6-1t:free

The free OpenRouter tier rate-limits aggressively; the
preceding ``feat(retry)`` commit's 429 backoff (7.5s/15s/22.5s)
keeps multi-agent INC runs working through transient throttles.

Operators on a paid OpenRouter plan should swap the model back to
``openai/gpt-4o-mini`` (or any other paid model) — the rest of
the registry is unchanged.
Bundles dist/app.py + dist/apps/{code-review,incident-management}.py
in line with the ``runtime.graph._RATE_LIMIT_MARKERS`` +
``_ainvoke_with_retry`` rate-limit branch from the preceding feat
commit. No bundle-only edits.
@sonarqubecloud

Copy link
Copy Markdown

@aksOps aksOps merged commit adefae6 into main May 14, 2026
8 checks passed
@aksOps aksOps deleted the feat/rate-limit-retry-and-multi-provider-validation branch May 14, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant